Storage Commands

Google Cloud Datalab provides a set of commands for working with data stored in Google Cloud Storage. They can help you work with data files containing data that is not stored in BigQuery or manage data imported into or exported from BigQuery.

This notebook introduces several Cloud Storage commands that Datalab introduces into the notebook environment.

The Commands

The commands can list storage buckets and their contained objects, manage those objects, and read from and write to those objects.


In [4]:
%%gcs --help


usage: gcs [-h] {copy,create,delete,list,read,view,write} ...

Execute various Google Cloud Storage related operations. Use "%gcs <command>
-h" for help on a specific command.

positional arguments:
  {copy,create,delete,list,read,view,write}
                        commands
    copy                Copy one or more Google Cloud Storage objects to a
                        different location.
    create              Create one or more Google Cloud Storage buckets.
    delete              Delete one or more Google Cloud Storage buckets or
                        objects.
    list                List buckets in a project, or contents of a bucket.
    read                Read the contents of a Google Cloud Storage object
                        into a Python variable.
    view                View the contents of a Google Cloud Storage object.
    write               Write the value of a Python variable to a Google Cloud
                        Storage object.

optional arguments:
  -h, --help            show this help message and exit
None

Buckets and Objects

Items or files held in Cloud Storage are called objects. These objects are immutable once written. They are organized into buckets.

Listing

First, a couple of commands to list Datalab sample data. Try %%gcs list without arguments to list all buckets within the current project:


In [ ]:
%%gcs list

In [5]:
%%gcs list --objects gs://cloud-datalab-samples


Out[5]:
NameTypeSizeUpdated
applogsapplication/octet-stream5060502015-11-24 00:06:07.588000+00:00
carprices/testing.csvtext/csv36352015-10-06 09:02:03.638000+00:00
carprices/training.csvtext/csv150182015-10-06 09:01:46.040000+00:00
cars.csvtext/csv2482015-10-05 04:58:10.481000+00:00
cars2.csvtext/csv922015-10-05 05:41:30.935000+00:00
census/application/x-www-form-urlencoded;charset=UTF-802017-03-05 05:51:55.107000+00:00
census/ACS2014_PUMS_README.pdfapplication/pdf2893162017-03-05 05:52:31.193000+00:00
census/ss14psd.csvbinary/octet-stream81893232017-03-05 05:53:54.728000+00:00
hello.txttext/plain142015-10-05 04:48:39.433000+00:00
httplogs/logs20140615.csvtext/csv237999812015-10-06 08:39:42.605000+00:00
httplogs/logs20140616.csvtext/csv863237452015-10-06 08:39:43.067000+00:00
httplogs/logs20140617.csvtext/csv512825582015-10-06 08:39:43.622000+00:00
httplogs/logs20140618.csvtext/csv533803182015-10-06 08:39:44.191000+00:00
httplogs/logs20140619.csvtext/csv876913632015-10-06 08:39:44.794000+00:00
httplogs/logs20140620.csvtext/csv472293342015-10-06 08:39:45.236000+00:00
httplogs/logs_sample.csvtext/csv39492015-10-06 08:39:45.729000+00:00
stackdriver-monitoring/timeseries/per-zone-weekly-20161010.csvtext/csv87252016-10-13 16:09:21.470000+00:00
stackdriver-monitoring/timeseries/topic-message-sizes-20161208.csvtext/csv1147022016-12-08 21:36:21.500000+00:00
udfsample/application/x-www-form-urlencoded;charset=UTF-802015-11-23 23:57:38.494000+00:00
udfsample/2015_station_data.csvtext/csv42302015-11-24 00:20:14.575000+00:00

You can also use wildchars to list all objects matching a pattern:


In [9]:
%%gcs list --objects gs://cloud-datalab-samples/udf*


Out[9]:
NameTypeSizeUpdated
udfsample/application/x-www-form-urlencoded;charset=UTF-802015-11-23 23:57:38.494000+00:00
udfsample/2015_station_data.csvtext/csv42302015-11-24 00:20:14.575000+00:00

Creating


In [14]:
# Some code to determine a unique bucket name for the purposes of the sample
from google.datalab import Context
import random, string

project = Context.default().project_id
suffix = ''.join(random.choice(string.lowercase) for _ in range(5))
sample_bucket_name = project + '-datalab-samples-' + suffix
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket_object = sample_bucket_path + '/Hello.txt'

print('Bucket: ' + sample_bucket_path)
print('Object: ' + sample_bucket_object)


Bucket: gs://mysampleproject-datalab-samples-abcde
Object: gs://mysampleproject-datalab-samples-abcde/Hello.txt

NOTE: In the examples below, the variables are referenced in the command using $ syntax since the names are determined based on the current project. In your scenarios, you may be able to use literal values if they are constant instead of creating and using variables.


In [15]:
%%gcs create --bucket $sample_bucket_path

In [16]:
%%gcs list --objects $sample_bucket_path


Out[16]:

In [17]:
%%gcs copy --source gs://cloud-datalab-samples/hello.txt --destination $sample_bucket_object

In [18]:
%%gcs list --objects $sample_bucket_path


Out[18]:
NameTypeSizeUpdated
Hello.txttext/plain142017-03-07 10:03:16.808000+00:00

Reading and Writing


In [19]:
%%gcs view --object $sample_bucket_object


Out[19]:
'Hello World!\n\n'

In [20]:
%%gcs read --object $sample_bucket_object --variable text

In [22]:
print(text)


Hello World!



In [23]:
text = 'Hello World!\n====\n'

In [24]:
%%gcs write --variable text --object $sample_bucket_object

In [25]:
%%gcs list --objects $sample_bucket_path


Out[25]:
NameTypeSizeUpdated
Hello.txttext/plain182017-03-07 10:03:49.868000+00:00

Deleting


In [26]:
%%gcs delete --object $sample_bucket_object

In [27]:
%%gcs delete --bucket $sample_bucket_path

Looking Ahead

The above Cloud Storage commands build on the Storage APIs included in Datalab. Another notebook demonstrates these APIs.

Also, BigQuery functionality supports exporting data to and importing data from Cloud Storage, as shown in the BigQuery tutorials.